Data augmentation and enhancement for multimodal speech emotion recognition

نویسندگان

چکیده

Humans’ fundamental need is interaction with each other such as using conversation or speech. Therefore, it crucial to analyze speech computer technology determine emotions. The emotion recognition (SER) method detects emotions in by examining various aspects. SER a supervised decide the class This research proposed multimodal model one of deep learning based enhancement techniques, which attention mechanism. Additionally, this addresses imbalanced dataset problem field generative adversarial networks (GAN) data augmentation technique. achieved an excellent evaluation performance 0.96 96% for GAN configuration. work showed that could enhance and create balanced dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal emotion recognition from expressive faces, body gestures and speech

In this paper we present a multimodal approach for the recognition of eight emotions that integrates information from facial expressions, body movement and gestures and speech. We trained and tested a model with a Bayesian classifier, using a multimodal corpus with eight emotions and ten subjects. First individual classifiers were trained for each modality. Then data were fused at the feature l...

متن کامل

Speech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment

In this work, we design a neural network for recognizing emotions in speech, using the standard IEMOCAP dataset. Following the latest advances in audio analysis, we use an architecture involving both convolutional layers, for extracting highlevel features from raw spectrograms, and recurrent ones for aggregating long-term dependencies. Applying techniques of data augmentation, layerwise learnin...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Multimodal Emotion Recognition Integrating Affective Speech with Facial Expression

In recent years, emotion recognition has attracted extensive interest in signal processing, artificial intelligence and pattern recognition due to its potential applications to human-computer-interaction (HCI). Most previously published works in the field of emotion recognition devote to performing emotion recognition by using either affective speech or facial expression. However, Affective spe...

متن کامل

Multimodal Emotion Recognition

Multimodal fusion is the process whereby two or more forms of input are gathered together in order to produce a higher overall classification accuracy than individual unimodal systems. This is a popular technique in emotion recognition. In this study, we attempted to discover how much we could improve upon individual unimodal systems using decision level fusion. To accomplish this, we acquired ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Bulletin of Electrical Engineering and Informatics

سال: 2023

ISSN: ['2302-9285']

DOI: https://doi.org/10.11591/eei.v12i5.5031